AITopics | model selection criterion

Collaborating Authors

model selection criterion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards a Theoretical Framework of Out-of-Distribution Generalization

Neural Information Processing SystemsDec-24-2025, 21:28:11 GMT

Generalization to out-of-distribution (OOD) data is one of the central problems in modern machine learning. Recently, there is a surge of attempts to propose algorithms that mainly build upon the idea of extracting invariant features. Although intuitively reasonable, theoretical understanding of what kind of invariance can guarantee OOD generalization is still limited, and generalization to arbitrary out-of-distribution is clearly impossible. In this work, we take the first step towards rigorous and quantitative definitions of 1) what is OOD; and 2) what does it mean by saying an OOD problem is learnable. We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features.

name change, out-of-distribution generalization, theoretical framework, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

Towards a Theoretical Framework of Out-of-Distribution Generalization

Neural Information Processing SystemsJan-19-2025, 02:48:29 GMT

model selection criterion, out-of-distribution generalization, theoretical framework, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)

Add feedback

Leveraging free energy in pretraining model selection for improved fine-tuning

Munn, Michael, Wei, Susan

arXiv.org Artificial IntelligenceOct-7-2024

Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that lend themselves to good downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint's adaptability by measuring the concentration of nearby favorable parameters for the downstream task. We demonstrate that this free energy criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task. Furthermore, we provide empirical evidence that the free energy criterion reliably correlates with improved fine-tuning performance, offering a principled approach to predicting model adaptability. The advent of foundation models has significantly reshaped the landscape of modern machine learning (Bommasani et al., 2021).

checkpoint, downstream free energy, free energy, (12 more...)

arXiv.org Artificial Intelligence

2410.05612

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
(2 more...)

Add feedback

On the Laplace Approximation as Model Selection Criterion for Gaussian Processes

Besginow, Andreas, Hüwel, Jan David, Pawellek, Thomas, Beecks, Christian, Lange-Hegermann, Markus

arXiv.org Machine LearningMar-14-2024

Model selection aims to find the best model in terms of accuracy, interpretability or simplicity, preferably all at once. In this work, we focus on evaluating model performance of Gaussian process models, i.e. finding a metric that provides the best trade-off between all those criteria. While previous work considers metrics like the likelihood, AIC or dynamic nested sampling, they either lack performance or have significant runtime issues, which severely limits applicability. We address these challenges by introducing multiple metrics based on the Laplace approximation, where we overcome a severe inconsistency occuring during naive application of the Laplace approximation. Experiments show that our metrics are comparable in quality to the gold standard dynamic nested sampling without compromising for computational speed. Our model selection criteria allow significantly faster and high quality model selection of Gaussian process models.

approximation, laplace approximation, model evidence, (12 more...)

arXiv.org Machine Learning

2403.09215

Country:

Europe > Germany (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Probabilistic Truly Unordered Rule Sets

Yang, Lincen, van Leeuwen, Matthijs

arXiv.org Artificial IntelligenceJan-18-2024

Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.

algorithm, dataset, overlap, (16 more...)

arXiv.org Artificial Intelligence

2401.09918

Country: Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Neural Network Model Selection Using Asymptotic Jackknife Estimator and Cross-Validation Method

Neural Information Processing SystemsApr-6-2023, 19:13:04 GMT

Two theorems and a lemma are presented about the use of jackknife es(cid:173) timator and the cross-validation method for model selection. Theorem 1 gives the asymptotic form for the jackknife estimator. Combined with the model selection criterion, this asymptotic form can be used to obtain the fit of a model. The model selection criterion we used is the negative of the average predictive likehood, the choice of which is based on the idea of the cross-validation method. Lemma 1 provides a formula for further explo(cid:173) ration of the asymptotics of the model selection criterion.

jackknife estimator and cross-validation method, model selection criterion, neural network model selection, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.97)

Add feedback

Convex Covariate Clustering for Classification

Andrade, Daniel, Fukumizu, Kenji, Okajima, Yuzuru

arXiv.org Machine LearningMar-5-2019

Clustering, like covariate selection for classification, is an important step to understand and interpret the data. However, clustering of covariates is often performed independently of the classification step, which can lead to undesirable clustering results. Therefore, we propose a method that can cluster covariates while taking into account class label information of samples. We formulate the problem as a convex optimization problem which uses both, a-priori similarity information between covariates, and information from class-labeled samples. Like convex clustering [Chi and Lange, 2015], the proposed method offers a unique global minima making it insensitive to initialization. In order to solve the convex problem, we propose a specialized alternating direction method of multipliers (ADMM), which scales up to several thousands of variables. Furthermore, in order to circumvent computationally expensive cross-validation, we propose a model selection criterion based on approximate marginal likelihood estimation. Experiments on synthetic and real data confirm the usefulness of the proposed clustering method and the selection criterion.

artificial intelligence, covariate, machine learning, (14 more...)

arXiv.org Machine Learning

1903.0168

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

On an improvement of LASSO by scaling

Hagiwara, Katsuyuki

arXiv.org Machine LearningAug-22-2018

A sparse modeling is a major topic in machine learning and statistics. LASSO (Least Absolute Shrinkage and Selection Operator) is a popular sparse modeling method while it has been known to yield unexpected large bias especially at a sparse representation. There have been several studies for improving this problem such as the introduction of non-convex regularization terms. The important point is that this bias problem directly affects model selection in applications since a sparse representation cannot be selected by a prediction error based model selection even if it is a good representation. In this article, we considered to improve this problem by introducing a scaling that expands LASSO estimator to compensate excessive shrinkage, thus a large bias in LASSO estimator. We here gave an empirical value for the amount of scaling. There are two advantages of this scaling method as follows. Since the proposed scaling value is calculated by using LASSO estimator, we only need LASSO estimator that is obtained by a fast and stable optimization procedure such as LARS (Least Angle Regression) under LASSO modification or coordinate descent. And, the simplicity of our scaling method enables us to derive SURE (Stein's Unbiased Risk Estimate) under the modified LASSO estimator with scaling. Our scaling method together with model selection based on SURE is fully empirical and do not need additional hyper-parameters. In a simple numerical example, we verified that our scaling method actually improves LASSO and the SURE based model selection criterion can stably choose an appropriate sparse model.

artificial intelligence, lasso, machine learning, (17 more...)

arXiv.org Machine Learning

1808.0726

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Model Selection for Gaussian Process Regression by Approximation Set Coding

Fischer, Benjamin, Gorbach, Nico, Bauer, Stefan, Bian, Yatao, Buhmann, Joachim M.

arXiv.org Machine LearningOct-4-2016

Gaussian processes are powerful, yet analytically tractable models for supervised learning. A Gaussian process is characterized by a mean function and a covariance function (kernel), which are determined by a model selection criterion. The functions to be compared do not just differ in their parametrization but in their fundamental structure. It is often not clear which function structure to choose, for instance to decide between a squared exponential and a rational quadratic kernel. Based on the principle of approximation set coding, we develop a framework for model selection to rank kernels for Gaussian process regression. In our experiments approximation set coding shows promise to become a model selection criterion competitive with maximum evidence (also called marginal likelihood) and leave-one-out cross-validation.

approximation, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1610.00907

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Data-Driven Learning of the Number of States in Multi-State Autoregressive Models

Ding, Jie, Noshad, Mohammad, Tarokh, Vahid

arXiv.org Machine LearningOct-9-2015

In this work, we consider the class of multi-state autoregressive processes that can be used to model non-stationary time-series of interest. In order to capture different autoregressive (AR) states underlying an observed time series, it is crucial to select the appropriate number of states. We propose a new model selection technique based on the Gap statistics, which uses a null reference distribution on the stable AR filters to check whether adding a new AR state significantly improves the performance of the model. To that end, we define a new distance measure between AR filters based on mean squared prediction error (MSPE), and propose an efficient method to generate random stable filters that are uniformly distributed in the coefficient space. Numerical results are provided to evaluate the performance of the proposed approach.

artificial intelligence, machine learning, reference curve, (14 more...)

arXiv.org Machine Learning

1506.02107

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback